Sublinear Projective Clustering with Outliers
نویسندگان
چکیده
Given a set of n points in <d, a family of shapes S and a number of clusters k, the projective clustering problem is to find a collection of k shapes in S such that the maximum distance from a point to its nearest shape is minimized. Some special cases of the problem include the k-line center problem where the goal is to cover the points with minimum radius hypercylinders and the k-hyperplane center problem where the goal is to cover the points with minimum width slabs. In practice, projective clustering algorithms are often used as a dimension reduction technique to enable more effective data representation for indexing and data mining purposes on massively large datasets (See, for example, [8, 9]). In typical applications the number of points n is extremely large, the dimensionality d is large, the data possesses some outliers, while the number of clusters, k is small. Consequently, the emphasis of this paper will be on the running times of the algorithms. We present for the first time sublinear time randomized algorithms for the k-line and hyperplane center problems, where the running times of our algorithms are independent of n.
منابع مشابه
Projective Clustering Method for the Detection of Outliers in Non-Axis Aligned Subspaces
Clustering the case of non-axis-aligned subspaces and detection of outliers is a major challenge due to the curse of dimensionality. The normal clustering was efficient in axis-aligned subspaces only. To solve this problem, projective clustering has been defined as an extension to traditional clustering that attempts to find projected clusters in subsets of the dimensions of a data space. A pro...
متن کاملLinear Time Algorithm for Projective Clustering
Projective clustering is a problem with both theoretical and practical importance and has received a great deal of attentions in recent years. Given a set of points P in R space, projective clustering is to find a set F of k lower dimensional j-flats so that the average distance (or squared distance) from points in P to their closest flats is minimized. Existing approaches for this problem are ...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملA robust wavelet based profile monitoring and change point detection using S-estimator and clustering
Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...
متن کاملClustering cancer gene expression data by projective clustering ensemble
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with th...
متن کامل